智能论文笔记

AutoPruner: Transformer-Based Call Graph Pruning

Thanh Le-Cong , Hong Jin Kang , Truong Giang Nguyen , Stefanus Agus Haryono , David Lo , Xuan-Bach D. Le , Huynh Quyet Thang

分类：人工智能

2022-09-07

构建静态呼叫图需要在健全和精度之间进行权衡。不幸的是，用于构建呼叫图的程序分析技术通常不精确。为了解决这个问题，研究人员最近提出了通过机器学习为静态分析构建的后处理呼叫图所授权的呼叫图。机器学习模型的构建是为了通过在随机森林分类器中提取结构特征来捕获呼叫图中的信息。然后，它消除了预测为误报的边缘。尽管机器学习模型显示了改进，但它们仍然受到限制，因为它们不考虑源代码语义，因此通常无法有效地区分真实和误报。在本文中，我们提出了一种新颖的呼叫图修剪技术AutoRoprouner，用于通过统计语义和结构分析消除呼叫图中的假阳性。给定一个由传统静态分析工具构建的呼叫图，AutoProuner采用基于变压器的方法来捕获呼叫者与呼叫图中每个边缘相关的呼叫者和Callee函数之间的语义关系。为此，AutoProuner微型调节模型是在大型语料库上预先训练的代码模型，以根据其语义的描述表示源代码。接下来，该模型用于从与呼叫图中的每个边缘相关的功能中提取语义特征。 AutoProuner使用这些语义功能以及从呼叫图提取的结构特征通过馈送前向神经网络分类。我们在现实世界程序的基准数据集上进行的经验评估表明，AutoProuner的表现优于最先进的基线，从而改善了F量级，在识别静态呼叫图中识别错误阳性边缘方面，高达13％。

translated by 谷歌翻译

Link-Intensive Alignment for Incomplete Knowledge Graphs

Vinh Van Tong , Thanh Trung Huynh , Thanh Tam Nguyen , Hongzhi Yin , Quoc Viet Hung Nguyen , Quyet Thang Huynh

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-17

知识图（kg）对齐 - 指识别不同kgs中同一件事的实体的任务 - 被认为是KG构造领域中最重要的操作之一。然而，现有的对齐技术通常假设输入kgs是完整的并且同性的，这是由于域，大小和稀疏性的现实世界异质性而不是真实。在这项工作中，我们解决了与代表学习对齐不完整的KG对齐的问题。我们的KG嵌入式框架利用了两个特征频道：基于传输型和基于接近的。前者通过翻译路径捕获实体之间的一致性约束，而后者通过注意引导关系感知图形神经网络捕获KG的邻域结构。两个特征频道共同学习以在输入kgs之间交换重要特征，同时强制在同一嵌入空间中强制输入kg的输出表示。此外，我们开发了缺失的链接检测器，该探测器发现并恢复培训过程中输入kgs中的缺失链接，这有助于减轻不完整性问题，从而提高学习象征的兼容性。然后将嵌入的熔合融合以生成对准结果，并且高置信匹配节点对被更新为预先调整的监控数据以逐渐改善嵌入。经验结果表明，我们的型号比SOTA更准确，而且对不同级别的不完整性较高，高达15.2 \％。我们还证明了KGS之间交换的知识有助于揭示知识图表（A.K.A.知识完成）的看不见的事实，结果比SOTA知识图形完成技术高3.5 \％。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet

Kaidi Xu , Nguyen Van Huynh , Geoffrey Ye Li

分类：机器学习

2022-12-15

In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.

translated by 谷歌翻译

Self-adaptive algorithms for quasiconvex programming and applications to machine learning

Thang Tran Ngoc , Hai Trinh Ngoc

分类：机器学习

2022-12-13

For solving a broad class of nonconvex programming problems on an unbounded constraint set, we provide a self-adaptive step-size strategy that does not include line-search techniques and establishes the convergence of a generic approach under mild assumptions. Specifically, the objective function may not satisfy the convexity condition. Unlike descent line-search algorithms, it does not need a known Lipschitz constant to figure out how big the first step should be. The crucial feature of this process is the steady reduction of the step size until a certain condition is fulfilled. In particular, it can provide a new gradient projection approach to optimization problems with an unbounded constrained set. The correctness of the proposed method is verified by preliminary results from some computational examples. To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning, such as supervised feature selection, multi-variable logistic regressions and neural networks for classification.

translated by 谷歌翻译

ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference Pipeline

Haodi Wang , Thang Hoang

分类：机器学习

2022-12-11

Machine Learning as a service (MLaaS) permits resource-limited clients to access powerful data analytics services ubiquitously. Despite its merits, MLaaS poses significant concerns regarding the integrity of delegated computation and the privacy of the server's model parameters. To address this issue, Zhang et al. (CCS'20) initiated the study of zero-knowledge Machine Learning (zkML). Few zkML schemes have been proposed afterward; however, they focus on sole ML classification algorithms that may not offer satisfactory accuracy or require large-scale training data and model parameters, which may not be desirable for some applications. We propose ezDPS, a new efficient and zero-knowledge ML inference scheme. Unlike prior works, ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy. Each stage of ezDPS is harnessed with an established ML algorithm that is shown to be effective in various applications, including Discrete Wavelet Transformation, Principal Components Analysis, and Support Vector Machine. We design new gadgets to prove ML operations effectively. We fully implemented ezDPS and assessed its performance on real datasets. Experimental results showed that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics while maintaining more desirable accuracy than single ML classification approaches.

translated by 谷歌翻译

Implicit causality in GPT-2: a case study

Hien Huynh , Tomas O. Lentz , Emiel van Miltenburg

分类：自然语言处理 | 人工智能

2022-12-08

This case study investigates the extent to which a language model (GPT-2) is able to capture native speakers' intuitions about implicit causality in a sentence completion task. We first reproduce earlier results (showing lower surprisal values for pronouns that are congruent with either the subject or object, depending on which one corresponds to the implicit causality bias of the verb), and then examine the effects of gender and verb frequency on model performance. Our second study examines the reasoning ability of GPT-2: is the model able to produce more sensible motivations for why the subject VERBed the object if the verbs have stronger causality biases? We also developed a methodology to avoid human raters being biased by obscenities and disfluencies generated by the model.

translated by 谷歌翻译

Improving Pareto Front Learning via Multi-Sample Hypernetworks

Long Phi Hoang , Dung Duy Le , Tuan Anh Tran , Thang Tran Ngoc

分类：机器学习

2022-12-02

Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preference of one Pareto solution over another, and must switch between them depending on the situation. However, existing PFL methods ignore the relationship between the solutions during the optimization process, which hinders the quality of the obtained front. To overcome this issue, we propose a novel PFL framework namely \ourmodel, which employs a hypernetwork to generate multiple solutions from a set of diverse trade-off preferences and enhance the quality of the Pareto front by maximizing the Hypervolume indicator defined by these solutions. The experimental results on several MOO machine learning tasks show that the proposed framework significantly outperforms the baselines in producing the trade-off Pareto front.

translated by 谷歌翻译

Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages

Tuan-Luc Huynh , Khoi-Nguyen Nguyen-Ngoc , Chi-Bien Chu , Minh-Triet Tran , Trung-Nghia Le

分类：计算机视觉

2022-12-01

According to the World Federation of the Deaf, more than two hundred sign languages exist. Therefore, it is challenging to understand deaf individuals, even proficient sign language users, resulting in a barrier between the deaf community and the rest of society. To bridge this language barrier, we propose a novel multilingual communication system, namely MUGCAT, to improve the communication efficiency of sign language users. By converting recognized specific hand gestures into expressive pictures, which is universal usage and language independence, our MUGCAT system significantly helps deaf people convey their thoughts. To overcome the limitation of sign language usage, which is mostly impossible to translate into complete sentences for ordinary people, we propose to reconstruct meaningful sentences from the incomplete translation of sign language. We also measure the semantic similarity of generated sentences with fragmented recognized hand gestures to keep the original meaning. Experimental results show that the proposed system can work in a real-time manner and synthesize exquisite stunning illustrations and meaningful sentences from a few hand gestures of sign language. This proves that our MUGCAT has promising potential in assisting deaf communication.

translated by 谷歌翻译

Improving Low-Resource Question Answering using Active Learning in Multiple Stages

Maximilian Schmidt , Andrea Bartezzaghi , Jasmina Bogojeska , A. Cristiano I. Malossi , Thang Vu

分类：自然语言处理 | 人工智能

2022-11-27

Neural approaches have become very popular in the domain of Question Answering, however they require a large amount of annotated data. Furthermore, they often yield very good performance but only in the domain they were trained on. In this work we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume sufficient amount of labeled data from the source domain is available. We perform extensive experiments to find the best setup for incorporating domain experts. Our findings show that our novel approach, where humans are incorporated as early as possible in the process, boosts performance in the low-resource, domain-specific setting, allowing for low-labeling-effort question answering systems in new, specialized domains. They further demonstrate how human annotation affects the performance of QA depending on the stage it is performed.

translated by 谷歌翻译